As per last week, download and unzip today’s data into a new folder and open the file “tutorial_10.Rproj”. This will open up R and R studio and make sure that you don’t have to worry about file-paths. Make sure that you unzip everything into the same folder.
Next, try running the following code by copying and pasting (Ctrl+C, Ctrl+V) it into your R session’s console.
# The following code functions ensure that the packages (like sub-programs that R uses to do things) are installed and attached
for(package in c('lme4', 'lmerTest', 'tidyverse', 'foreign', 'readr', 'sjstats')) {
if(!require(package, character.only=T, quietly=T)) {
install.packages(package)
library(package, character.only=T)
}
}
The data for this week’s exercise comes is the same as last weeks - it is an organizational psychology data-set described in Klein et al (2000), and is stored as siop_merged.csv.
The file siop_merge.csv contains 750 employee-level observations nested within 50 work-groups. Ignoring the variable group ID (grpid), there are eight standardized variables, where a higher score indicates a higher level (e.g., higher pay or more negative leadership behaviors). These data were collected by individual survey of the employees, so measure individual level perceptions, except for the variable “physen”, which is the group level “physical work environment”.
Table 1. The variable names and variables included in siop_merge.csv.
The first thing we need to do is to read in the data. Copy and paste the following into your R console to load today’s data.
siop_merged <- read.csv("siop_merged.csv")
The first variables we’ll be dealing with today are “Job satisfaction”, “Positive affect” and “physical work environment (group level)”. As always its a good idea to plot your data. This data has three variables, so you can either plot it with facets, or use a 3D plot. Click and drag this one to have a look at your data;
Figure 1. A 3D scatterplot of Job satisfaction, positive affect and group Physical environment, color indicates group membership.
What is the relationship between positive affect and job satisfaction? Is there a clear directional relationship?
Let’s fit a random intercept model with one individual level predictor - we’ll start with positive affect - and one group level predictor - physical work environment.
This model with random intercepts for group, a single level 1 predictor (positive affect) and a single group level predictor may be written:
\(JobSat_{ij} = \gamma_{00} + \gamma_{10} PosAff_{ij}+ \gamma_{01}PhysEn_{0j} + u_{0j} + \epsilon_{ij}\)
This means each person’s job satisfaction is equal to the overall intercept (\(\gamma_{00}\)), a regression term times their positive affect (\(\gamma_{10}(PosAff_{ij})\)), a regression term times the group level predictor (\(\gamma_{01}PhysEn_{0j}\)), a random effect for their group (\(u_{0j}\)), plus an error term (\(\epsilon_{ij}\)) (i.e., the residual, as the MLM will not perfectly predict their score). You can run this model with the following code:
PA_PE_model <- lmer("jobsat ~ 1 + posaff + physen + (1 | grpid)", data = siop_merged)
| Terms | Meaning |
|---|---|
| jobsat ~ | job satisfaction is predicted by |
| 1 + | An intercept parameter which is the same for everyone |
| posaff | a regression coefficient times each person’s positive affect |
| physen | a regression coefficient times each group’s physical enviroment |
| (1 | grpid) | A random intercept for each group |
This model says that the relationship between positive affect and job satisfaction is the same in every group, but that the baseline - the intercept - can vary by group, and that we can predict the group intercepts using the group’s score on physical environment. To visualize this, we can plot job satisfaction by positive affect and draw a regression line for each group. The model does not actually specify an intercept for each group, but rather assumes that the intercepts come from a normal distribution with a mean of the overall intercept and a variance as specified in the random effects output above. The output we will often be interested is the variance of these groups’ intercepts as opposed to their individual values.
Remember summary(PA_PE_model) is useful in getting a summary of your output out!
summary(PA_PE_model)
Figure 2. Job satisfaction by positive affect, lines show the estimated relationship by group, color indicates group membership.
What is the overall relationship between positive affect and job satisfaction? Which value from your model summary tells us about this value?
\(JobSat_{ij} = \gamma_{00} + \gamma_{10} PosAff_{ij}+ \gamma_{01}PhysEn_{0j} + u_{0j} + \epsilon_{ij}\)
Looking at the equation above and comparing it to your model summary, can you identify which part of the model summary output corresponds to each part of the equation?
Now we will include a random slope for the association between positive affect and job satisfaction as well as a group level predictor for the intercept, but without other predictors for the slope. This means that positive affect becomes both a fixed and a random effect.
This model with random intercepts and a random slope for group, a single group level predictor (physical work environment), and a single level 1 predictor (positive affect) may be written:
\(JobSat_{ij} = \gamma_{00} + \gamma_{10} PosAff_{ij} + \gamma_{01} PhysEn_{0j} + u_{0j} + u_{1j}PosAff_{ij} + \epsilon_{ij}\)
We can break this equation down into two parts, the fixed and random effects.
The fixed effects say that each observed data point is predicted by an overall intercept (\(\gamma_{00}\)), a regression term times their positive affect \(\gamma_{10} (PosAff_{ij})\), and a group level predictor \(\gamma_{01} PhysEn_{0j}\).
The random effects say that: we have randoms slopes for giving us random slopes (\(u_{1j}PosAff_{ij}\)), a random effect for group (\(u_{0j}\)), plus an error term for the individual (\(\epsilon_{ij}\)) (i.e., the residual or difference between predicted and observed scores).
RS_model <- lmer("jobsat ~ 1 + posaff + physen + (1 + posaff | grpid)", data = siop_merged)
| Terms | Meaning |
|---|---|
| jobsat ~ | job satisfaction is predicted by |
| 1 + | An intercept parameter which is the same for everyone |
| posaff | a regression coefficient times each person’s positive affect |
| physen | a regression coefficient times each group’s physical enviroment |
| (1 + posaff | grpid) | A random intercept and a random slope for the relationship between positive affect and job satisfaction for each group |
This model says that the relationship between positive affect and job satisfaction changes in different groups by some estimated amount, and that we can predict the groups’ true mean scores using their group’s physical environment. To visualize this, we can plot job satisfaction by positive affect and draw a regression line for each group. The regression line slopes and intercepts change by group, because we now have random slopes and intercepts, allowing the relationship between positive affect and job satisfaction to change by group. Conceptually, this model is better thought of as assuming that the group slopes and intercepts come from a normal distribution some overall mean (i.e., the fixed effects for the intercepts and slopes) and an estimated variance. The output we’re often interested in above and beyond the fixed effects is the variance of these groups’ intercepts, and the variance of the groups slopes.
Remember summary(RS_model) is useful in getting a summary of your output out!
summary(RS_model)
To visualize this, we can again plot job satisfaction by positive affect and draw a regression line for each group. The regression line slopes and intercepts change by group, because we now have random slopes and intercepts, allowing the relationship between positive affect and job satisfaction to change by group.
Figure 3. Job satisfaction by positive affect, lines show the estimated relationship by group, color indicates group.
What are we saying about the relationship between job satisfaction and positive affect by including random slopes? Do you think that this claim is plausible?
Let’s fit a random intercept and slopes model with one individual level predictor - again positive affect - and one group level predictor - physical work environment, and one group level predictor of the random effects.
This model with random slopes and intercepts for group, a single level 1 predictor (positive affect), and an interaction between workload and physical environment may be written:
\(JobSat_{ij} = \gamma_{00} + \gamma_{10} PosAff_{ij} + \gamma_{01} PhysEn_{0j} + \gamma_{10} (PosAff_{ij} \times PhysEn_{0j}) + u_{0j} + u_{1j}PosAff_{ij} + \epsilon_{ij}\)
This means each person’s job satisfaction is equal to the overall intercept (\(\gamma_{00}\)), a regression term times their positive affect (\(\gamma_{10} (PosAff_{ij})\)), a random effect times the individual level predictor giving us random slopes (\(u_{1j}PosAff_{ij}\)), a random effect for group (\(u_{0j}\)), plus an error term for the individual (\(\epsilon_{ij}\)).
You can run the model in R by copying and pasting the following into your session (make sure you’ve loaded siop_merged first):
RS_model_2 <- lmer("jobsat ~ 1 + physen + posaff + posaff * physen + (1 + posaff | grpid)", data = siop_merged)
| Terms | Meaning |
|---|---|
| jobsat ~ | job satisfaction is predicted by |
| 1 + | An intercept parameter which is the same for everyone |
| posaff | a regression coefficient times each person’s positive affect |
| physen | a regression coefficient times each group’s physical enviroment |
| physen:posaff | a regression coefficient times the product of physical enviroment times positive affect |
| (1 + posaff | grpid) | A random intercept and a random slope for the relationship between positive affect and job satisfaction for each group |
summary(RS_model_2)
That this model says that the relationship between positive affect and job satisfaction changes in different groups by some estimated amount, and that we can predict the groups’ true mean scores using their group’s physical environment, and that the relationship between positive affect and job satisfaction is influenced by the physical environment.
Remember summary(PA_PE_model) is useful in getting a summary of your output out!
To visualize this, we can again plot job satisfaction by positive affect and draw a regression line for each group. The regression line slopes and intercepts still change by group because we have random slopes and intercepts, however we are now trying to predict the slope of each group with the variable physical enviroment.
Figure 4. Job satisfaction by positive affect, lines show the estimated relationship by group, color indicates group.
What are we saying when we include the interaction term in our model? Does it look like it’s worth including this term in the model?
Does it look like it’s worth including this term in the model?
Hint: How does the variance of the random effect for positive affect () change between models?
Looking at each of the models, which do you think is best (take a look at “summary(PA_PE_model) and summary(RS_model)). Do we even need random slopes?
Hint: Compare the amount of residual variance in RS_model to the amount seen in PA_PE_model. Does the increased complexity of the random slopes model seem worth the reduction in residual variance?
summary(PA_PE_model)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: "jobsat ~ 1 + posaff + physen + (1 | grpid)"
## Data: siop_merged
##
## REML criterion at convergence: 3434.1
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.8823 -0.6362 -0.0291 0.6512 3.5698
##
## Random effects:
## Groups Name Variance Std.Dev.
## grpid (Intercept) 0.5592 0.7478
## Residual 5.3205 2.3066
## Number of obs: 750, groups: grpid, 50
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 0.03007 0.13555 48.11620 0.222 0.8254
## posaff 0.42838 0.08857 737.38622 4.836 1.61e-06 ***
## physen 0.27909 0.11266 47.96028 2.477 0.0168 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) posaff
## posaff -0.041
## physen -0.060 0.004
summary(RS_model)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: "jobsat ~ 1 + posaff + physen + (1 + posaff | grpid)"
## Data: siop_merged
##
## REML criterion at convergence: 3433.4
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.8807 -0.6458 -0.0215 0.6618 3.4053
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## grpid (Intercept) 0.54739 0.7399
## posaff 0.04955 0.2226 0.39
## Residual 5.27734 2.2972
## Number of obs: 750, groups: grpid, 50
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 0.02677 0.13469 47.90243 0.199 0.8433
## posaff 0.43395 0.09414 41.12751 4.610 3.88e-05 ***
## physen 0.26624 0.11121 46.82901 2.394 0.0207 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) posaff
## posaff 0.065
## physen -0.059 0.004
summary(RS_model_2)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula:
## "jobsat ~ 1 + physen + posaff + posaff * physen + (1 + posaff | grpid)"
## Data: siop_merged
##
## REML criterion at convergence: 3436.1
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.8686 -0.6580 -0.0291 0.6608 3.3994
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## grpid (Intercept) 0.53866 0.7339
## posaff 0.05875 0.2424 0.39
## Residual 5.27738 2.2973
## Number of obs: 750, groups: grpid, 50
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 0.02581 0.13408 47.76886 0.192 0.848
## physen 0.27583 0.11158 47.89985 2.472 0.017 *
## posaff 0.42868 0.09552 41.23772 4.488 5.67e-05 ***
## physen:posaff 0.05560 0.07700 37.62615 0.722 0.475
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) physen posaff
## physen -0.059
## posaff 0.073 -0.007
## physen:psff -0.006 0.131 -0.085
What is the relationship between positive affect and job satisfaction? Is there a clear directional relationship?
Taking a look at figure 1, it looks like positive affect is correlated with job satisfaction.
What is the overall relationship between positive affect and job satisfaction? Which value from your model summary tells us about this value?
This model suggests that there is a positive relationship between positive affect and job satisfaction, with an increase of one unit of positive affect leading to a predicted increase of 0.428 units of job satisfaction.
\(JobSat_{ij} = \gamma_{00} + \gamma_{10} PosAff_{ij}+ \gamma_{01}PhysEn_{0j} + u_{0j} + \epsilon_{ij}\)
Looking at the equation above and comparing it to your model summary, can you identify which part of the model summary output corresponds to each part of the equation?
What are we saying about the relationship between job satisfaction and positive affect by including random slopes? Do you think that this claim is plausible?
We are saying that the relationship between positive affect and job satisfaction may differ by work group.
What are we saying when we include the interaction term in our model?
We are saying that some of the variance of the relationship between job satisfaction and positive affect using the physical work environment. I.e., that physical
Does it look like it’s worth including this term in the model?
Hint: How does the variance of the random effect for positive affect () change between models?
The residual variance actually increases - without interaction: 0.04955, going to 0.05875 with the interaction. This is something that can never happen when adding terms in normal multiple regression, but can under rare circumstances in multilevel models.
You do not need to know why, but this paper http://www.stat.columbia.edu/~gelman/stuff_for_blog/adding.pdf gives as readable an explanation as is available! In the context of interactions it is very difficult to figure out what this result implies and I recommend spending precious thinking time elsewhere.
Looking at each of the models, which do you think is best (take a look at summary(PA_PE_model) and summary(RS_model)). Do we even need random slopes?
Hint: Compare the amount of residual variance in RS_model to the amount seen in PA_PE_model. Does the increased complexity of the random slopes model seem worth the reduction in residual variance?
There is a very small reduction in residual variance when including the random slopes, going from 5.3205 to 5.27734, a reduction of less than 1% (i.e., \(((5.3205 - 5.27734)/5.3205)*100\)). However, this term may make enough sense on purely conceptual grounds that excluding it from the model would be ill advised. Without an in depth understanding of the theoretical context its difficult to make concrete suggestions.